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ABSTRACT 

*x n«? ^ e(3Uence of th e 6408 nucleotides of bactexj.ooh.ap 
fd DNA has been determined. This allows to deduce the " 
exact organisation of the filamentous phage genome and 
provides easy access to DNA segments of known structure 
and function . 



INTRODUCTION 

Small DNA viruses depend during their life cycle largely 
on host functions and are therefore preferred model systems 
for the analysis of the organisation, expression and repli- 
cation of the more complex host genomes. To analyse viral 
genomes at the nucleotide level has become technically 
possible with the development of new rapid DNA sequencing 
techniques 1 , 2. 3 , Complete nucleotide sequences have been 
reported so far for coli phage * x 174^- 5 and Simian Virus 
SV40* \ Here we report the sequence of bacteriophage fd 
,-~.,DNA, strain 478 (Heidelberg). 

Phage fd» along with fi and Ml 3 belongs to a group of 
^closely related filamentous, male-specific coli phages (for 



Ipgreviews See ref - 9 ' 10 >- Its genome is a single-stranded 
•gJPlrcular DNA of about 6000 nucleotides which is converted 
°=--to a double-stranded form in the infected cell. Eight genes 



— ww t-Aynu ycnei 

■ ave been ordered by combined genetic and biochemical ana- 



gggg^ysis within the phage genome. Its detailed organisation 
m-r. Lremai i 



—-remained, however, relatively uncertain due to the lack of 

""""L^ in data for most gene products. Furthermore, analysis 

«On. t-K« * 



Sbthe nucleotide level had concentrated mainly on DNA 
glints with regulatory functions 10 , 11 . 
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We have previously reported a preliminary nucleotide 
sequence of fd DNA ( n , and personal communications). The 
aim of this publication is the rapid communication of the 
final sequence. A more detailed account containing the 
experimental evidence will be published elsewhere. 



RESULTS AND DISCUSSION 

Restriction nucleases and cleavage maps . The enzymes 
used, their recognition ■ sequences and the position of 
cleavage sites confirmed or newly established during this 
work are presented in Fig. 1. All cleavage sites shown 
have also been identified by DNA sequencing the ends of 
the respective restriction fragments. With one exception, 
all parts of double-stranded fd can be fragmented by 
digestion with several of these enzymes into pieces of 
less than 200 base-pairs. 

DNA sequencing . The chemical method of Maxam and 
Gilbert 2 was used which allowed us to read sequences up 
to 150 (occasionally up to 220) nucleotides. Sequences 
obtained were stored and processed in a computer 
(G. Osterburg and R. Sommer, to be published) to yield 
the composite sequence of 6408 nucleotides presented in 
Fig. 2. About 75 % of this sequence was determined from 
both DNA strands in fd 478. Almost all of the missing 
25 % have also been sequenced in the second strand, but 
in the closely related phage f 1 . Further information was 
obtained for about 1000 nucleotides by RNA sequencing 1 2 
and for about 600 nucleotides by the plus/minus method 
of Sanger and Coulson 1 . About 10 % of the fd sequence were 
also established as recognition sequences for restriction 
nucleases at known cleavage sites (Fig. 1 and unpublished 
results) . 

Nucleotide sequence . According to Fig. 2 fd DNA is 
composed of 6408 nucleotides (1578A, 2210T, 1325G, 1295C) 
corresponding to a molecular weight of 2.12 x 10 6 daltons) 
(sodium salt) . The sequence differs from that reported 
earlier 11 mainly by an insert of 1 8 nucleotides in the 



4496 



Nucleic Acids Research 



AAt 1 


: A6C! 


Bom H I 


: 66*111 


SoulA 


. CAIC 


Ceo 4 C 




Ho* 0 


RCCGCT 


HflO 1 


GCGC 


ho* n 


CGCC 




GC6IC 


H90 1 


GACCC 


Hnf 1 


CANfC 


Hpo D 


CCGC 




GGTGA 


Hpft 1 


It AH 


HOO B 


GAAGA 

fene 


toq 1 


IC6A 









0 


£ hi z 




ID 




2 


1 


S 




0 




— LiJ 


LJOX 


3t — 






>Ot — 


) 




.) . 


■...1Z~ 














i t 


( ( 










c 


a 


* 


A 
1 




C F 


E 


B 


1 

1 


H 1 0 H M 




e 




0 c 








A 






B 


i 


A 


i H 1 


B 

* 


i i 


i 




A 


ii i 


— HI 


J 


. A 


B 


MOP 1 


c 


J 

1 


K 

J 


CAE 


' 6 ' 


M OlN 


F (Nli 

i t if 


* A 

i I 1 




B 




c 


F 

J i 




t 


A 

1 




Oil E 
1 


MOB 


i f 


ti 1 


e 


i t 1 


F 

1 




£ 

i ; 1 


0 

11 lit 


c 


: hi 


A 

H 1 I 


c s 


J * W 


a 

j i 


c 

1 


0 

A 


F 

) 


e p 
1 


1 R L V M 


A 


OUT H 
t 


w * 0 G 


F 


0 


t 


c 

i 1 ; 


C MO j 
1 


tril 


e 


E 


A 

I 


1 


H K LM F 

; 1 i 




A 




L M K 


F i 




e 

i 


C S 
1 1 i 


E 


0 

1 ; 


C HA 




1 




A 




1 


1 


C C f 


B 

( 


E 0 


A 

.. 1 




0 


J c 




E 


I 




A 




a 


H C 


0 


J 


.2 


J 




.L 


.5 


6 


7 




.« to 


0 




i 


3 






J 


1 




1 


6 i.LO 



rig, t : fr*o»ent p«pt of restriction nucleases used in the sequence analysis of fd DMA . 
kmi» «7I. fne known .ipi for *p*ff. Wc*f, Ka*J1 (HinHl), Hscilt, Alu 1 *- 11 «*r* confirmed 

refined. **ps for Whj.'. Win.'/. r«oJ. 5eo.*r. 5-uJA (Dpnl, Mbol), tco*II. HbolZ. *nd 
HphI were ne-lu e*t*eli*ned ff.A. Au*rs~*l<! «c eJ., «. r**AflA«u «t si., iccn u.iputlithed; . 
The first nucleotide of tne recoonition litei for tn« vmiious restriction nucleases «re 
Jilted oelo*. An additional Mint sits na* been detected in freowenc /anfC fposition 1UB) 
in tne OrfA from fd ATCC (ft. T*k*nsmi. unpubl i ened J . Tne circul*f P*»*9« e * A ** opened At 
tne unioue Hindlt (Hp*I) cleaveoe site. Tne »ap include* tne positions And tne orientation 
of* the pneoe penes. /C is the inte/jjenic rp«ce. 



Alul 


ACCT 


29 
5686 


6) 
6108 


229 
613S 


934 
6336 


14S6 


1517 


2963 


3277 


3613 


4097 


542 7 


5631 


BajoHI 


CCATCC 


2220 


564S 






















Sau3A 


CATC 


1362 


1714 


2221 


5646 


















CcoRXI 


CCTCC 


1014 


1966 






















KaelX 


RCCCCY 


2710 


4743 


5560 


SS68 


















Rh*I 


CCCC 


44 

4313 


671 
4642 


1011 
4744 


1065 
4666 


1177 
4996 


1470 
5491 


2195 
5504 


2467 
5513 


2711 
5535 


3040 
5561 


3096 
5569 


3599 


H«eIII 


CCCC 


1396 


2245 


2554 


5062 


5240 


5346 


S41S 


5726 


5829 


6181 






Hqal 


CACGC 
CCCTC 


526 
4064 


2164 
5159 


2479 


3238 


















UlnfX 


CANTC 


136 
3639 


216 
4073 


490 
4118 


S11 
4350 


723 
5121 


1403 
5330 


201 1 
5376 


2497 
5439 


2845 
5767 


3259 
5789 


3419 
6043 


3743 
6199 


UpAlI 


CCGC 


314 
6119 


966 
6179 


1095 
6221 


1924 


2376 


2390 


2396 


2552 


3371 


4019 


5615 


5996 


HphI 


CCTCA 
TCACC 


1 376 
5707 
1503 


1774 
6163 
2635 


1909 
436S 


2398 
6169 


2542 
6266 


2561 


2620 


2626 


3740 


4347 


4848 


5116 




CAACA 
TCTTC 


3913 
3529 


4076 


4272 


4938 


5256 


5568 














T»ql 


TCCA 


336 


966 


1127 


1508 


1949 


2528 


2815 


4834 


56B4 


604 1 







4497 



Nucleic Acids Research 



repetitive sequence around position 2380. Except for a 
G A transition in position 1859 the identical sequence 
was obtained in 2000 nucleotides from another fd strain 
(ATCC) . 

The nucleotide sequence of the related phage fl has 
been determined to about 90 % (E. Beck, unpublished). It 
differs from the fd sequence by deletion of a single 
nucleotide (position 3195) and by about 160 base changes. 
Except for seven, these are all silent mutations which 
do not alter the amino acid sequence of the fd gene 
products . 

Genome organisation . By analysing the fd DNA sequence 
for continuous translational reading frames - combined 
with the information obtained from the sequence of amber 
mutations in f1 and Ml 3 (E . Beck, unpublished; J. Schoen- 
makers, personal communication) and from the silent base 
changes in fl ■• allows to deduce the exact sizes and 
positions of the eight known gene products and of known 
regulatory signals. The DNA sequence predicts the amino 
acid sequences of known and unknown gene products, and 
the existence of a new gene (gene IX) in the intergenic 
space between genes VII and VIII 11 . 

According to our analysis (Fig. 2) the overall organi- 
sation of the filamentous phage genome differs markedly 
from that of icosahedral single-stranded DNA phages, 
like <J»x 1 74 ^ : Although genes are generally closely spaced 
there is only one single short overlap of genes in diffe- 
rent reading frames (at the junction of genes I and IV) . 
In addition there is an intergenic region (IG) of 508 
nucleotides which harbours the origins of DNA replica- 
tion 13 / 114 . Recent experiments show that this space can be 
further expanded by insertion of foreign DNA 16 . 

Applications . fd DNA is accessible in high yields in 
both its single-stranded and . double-stranded form 10 . The 
knowledge of its nucleotide sequence and of the map po- 
sitions of a great number of restriction sites provides 
therefore easy access to well defined DNA molecules which 
can be used in different investigations on DNA structure 
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and function. For example they have been used as size 
markers in their intact or restricted form, for the 
search for recognition sequences of restriction 
nucleases 17 , in the site-specific modification of the 
fd genome for use as a cloning vehicle 1 e , for the 
isolation and the cloning of regulatory signals from 
fd DNA, for the analysis of integration and loss of 
transposon Tn-5 ( 16 * r S E.A. Auerswald, to be published), 
and for the correlation cf thermal denaturation profiles 
of DNA molecules with their nucleotide sequence 16 . 
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